5 research outputs found

    Heuristics-based query optimisation for SPARQL

    Get PDF
    Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-ofthe-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks

    Heuristics-based Query Optimisation for SPARQL

    No full text
    Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-ofthe-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks. 1

    KP-LAB Knowledge Practices Laboratory -- Specifications and Prototype of the Knowledge Repository (V.3.0) and the Knowledge Mediator (V.3.0)

    No full text
    deliverablesThis deliverable reports the technical and research development performed until M36 (January 2009) within tasks T5.2 and T5.4 of WP5 in the KP-Lab project, per the latest Description of Work (DoW) 3.2 [DoW3.2]. The described components are included in the KP-Lab Semantic Web Knowledge Middleware (SWKM) Prototype Release 3.0 software that takes place in M36. This release builds on the Prototype Release 2.0 that was presented in [D5.4]. The present deliverable includes both the specification, as well as the implementation details for the described components. The description of the features of the new functionalities is provided based on the motivating scenarios and the subsequent functional requirements. The focus and the high-level objective of the new services is the provision of improved scalability and modularity properties on the existing services, as well as improved management abilities upon conceptualizations. The implementation of the services is described by providing the related services signatures, their proper way of use, the accepted input parameters, as well as their preconditions and effects. Initially, we describe the Delete Service, which is a Knowledge Repository service allowing the removal of existing namespaces from the repository; such removal includes the deletion of the contents of said namespaces, as well as the deletion of any reference to the namespaces themselves that exists in the repository. This new service enhances SWKM management capabilities upon conceptualizations. Then, the Named Graphs functionality is described, which is a new feature that allows a very flexible modularization of the information found in RDF KBs. We describe in detail the semantics of this feature, as well as the offered capabilities for querying and updating RDF KBs that include modularization information (i.e., information on named graphs) and the implications from their use. Finally, in the context of the Knowledge Mediator, we present the Persistent Comparison Service, which is a variation of the existing (Main Memory) Comparison Service (see M24 release, [D5.3], [D5.4]); unlike the original version, the new service works exclusively on the persistent storage, guaranteeing improved scalability features
    corecore